Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes (v2)#6008
Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes (v2)#6008virtuald wants to merge 6 commits intopybind:masterfrom
Conversation
- Occurs with non-smart-holder property of smart-holder class
|
@virtuald @oremanj I think this PR is definitely the way to go (suggestion to close PR #6003 in favor of this PR). Below is a Opus 4.6 1M Thinking analysis of the fix. I think we should merge this fix as-is, because it's a strict improvement, and maybe do other things in follow-on PRs. For this PR: I still need to look carefully at the tests. The bugWhen a smart-holder class ( The fixIn static handle
cast(const std::shared_ptr<type> &src, return_value_policy policy, handle parent) {
const auto *ptr = src.get();
typename type_caster_base<type>::cast_sources srcs{ptr};
if (srcs.creates_smart_holder()) {
return smart_holder_type_caster_support::smart_holder_from_shared_ptr(
src, policy, parent, srcs.result);
}
auto *tinfo = srcs.result.tinfo;
if (tinfo != nullptr && tinfo->holder_enum_v == holder_enum_t::std_shared_ptr) {
return type_caster_base<type>::cast_holder(srcs, &src);
}
if (parent) {
return type_caster_base<type>::cast(
srcs, return_value_policy::reference_internal, parent);
}
throw cast_error("Unable to convert std::shared_ptr<T> to Python when the bound type "
"does not use std::shared_ptr or py::smart_holder as its holder type");
}Three cases:
Why the
|
include/pybind11/cast.h
Outdated
|
|
||
| if (parent) { | ||
| return type_caster_base<type>::cast( | ||
| srcs, return_value_policy::reference_internal, parent); |
There was a problem hiding this comment.
With reference to my previous comment: I'm having second thoughts.
It'd be best to make this right in this PR, otherwise we'll risk having a subtle behavior change later. We should inspect policy to decide if we actually need to override it. E.g. I'd expect that we don't need to override return_value_policy::copy.
There was a problem hiding this comment.
Using the original policy doesn't break the tests locally (whereas the absence of the fixes causes the tests to crash) so this seems fine assuming the tests pass in CI.
There was a problem hiding this comment.
gpt-5.4 found a counterexample that fails if the policy isn't changed:
def test_non_smart_holder_member_type_with_smart_holder_owner_aliases_member():
obj = m.ShWithSimpleStructMember()
legacy = obj.legacy
legacy.value = 13
assert obj.legacy.value == 13
... well, actually, that's surprising to me? The documentation says that def_readwrite should use reference_internal by default.
There was a problem hiding this comment.
Here is what Cursor Opus 4.6 1M Thinking is saying (it seems believable to me; tracing through manually would take a good chunk of time):
The counterexample (found by Dustin via gpt-5.4)
def test_non_smart_holder_member_type_with_smart_holder_owner_aliases_member():
obj = m.ShWithSimpleStructMember()
legacy = obj.legacy
legacy.value = 13
assert obj.legacy.value == 13 # FAILS with policy pass-throughThis test expects obj.legacy to return a reference into the parent object
(so mutating the returned object mutates the member). With commit b299f32
("Use default policy"), the test fails because a copy is returned instead.
Root cause: return_value_policy_override
The full call chain for def_readwrite with a smart_holder parent and a
unique_ptr-held member type:
-
def_readwritespecifiesreturn_value_policy::reference_internal
(pybind11.hline 2371). -
The smart_holder getter (
property_cpp_function_sh_member_held_by_value::read())
returnsstd::shared_ptr<D>by value — an aliasingshared_ptrthat
points into the parent object. -
At call time, pybind11's dispatch applies:
return_value_policy policy = return_value_policy_override<Return>::policy(call.func.policy);(
pybind11.hline 414) -
return_value_policy_overridehas a specialization for types whose caster
inherits fromtype_caster_generic(cast.hlines 1632–1641):static return_value_policy policy(return_value_policy p) { return !std::is_lvalue_reference<Return>::value && !std::is_pointer<Return>::value ? return_value_policy::move : p; }
Since
std::shared_ptr<D>(returned by value) is neither an lvalue reference
nor a pointer, the policy is unconditionally overridden tomove. -
So by the time
copyable_holder_caster::cast()is entered,policyis
move, notreference_internal. The original intent is lost.
What happens with each variant of the fallback
Hardcoded reference_internal (original PR #6008 code)
if (parent) {
return type_caster_base<type>::cast(
srcs, return_value_policy::reference_internal, parent);
}In type_caster_generic::cast(), the reference_internal case
(type_caster_base.h lines 1091–1094):
case return_value_policy::reference_internal:
valueptr = src; // raw pointer into the parent
wrapper->owned = false;
keep_alive_impl(inst, parent);
break;Result: the returned Python object is a reference to the member inside the
parent. Mutations are visible through the parent. The parent is kept alive.
Correct def_readwrite semantics.
Pass-through policy (commit b299f32)
if (parent) {
return type_caster_base<type>::cast(srcs, policy, parent);
}policy is move (due to the override). In type_caster_generic::cast(),
the move case (type_caster_base.h lines 1070–1088):
case return_value_policy::move:
if (move_constructor) {
valueptr = move_constructor(src);
} else if (copy_constructor) {
valueptr = copy_constructor(src);
}
// ...
wrapper->owned = true;
break;Result: a new independent copy is created. owned = true means the holder
is constructed and will destroy it. Mutations to the copy are invisible to the
parent. Broken def_readwrite semantics.
Why this concern about silently overriding copy is moot
The worry was: "We should inspect policy to decide if we actually need to
override it. E.g. I'd expect that we don't need to override
return_value_policy::copy."
This concern doesn't apply because return_value_policy_override has already
stomped whatever policy was originally specified to move, unconditionally, for
any by-value shared_ptr return. The policy variable arriving at
copyable_holder_caster::cast() is always move for this code path. There is
no copy to preserve.
The only way policy could be something other than move here is if the
function's declared return type were const std::shared_ptr<D>& (an lvalue
reference), in which case the override would pass through the original policy.
But the smart_holder def_readwrite getter always returns shared_ptr<D> by
value.
Conclusion
Commit b299f32 should be reverted. The fallback path must hardcode
reference_internal:
if (parent) {
return type_caster_base<type>::cast(
srcs, return_value_policy::reference_internal, parent);
}This is correct because:
- For
def_readwrite(the primary use case): preserves reference semantics
despite the policy override stompingreference_internal→move. - For any other caller that reaches this fallback with a
parent: the old
behavior was UB (stuffing ashared_ptrinto aunique_ptrholder), so
reference_internalis a strict improvement regardless. - The
throwpath (no parent) is unchanged and catches the truly unsupported
case.
There was a problem hiding this comment.
I just realized, when Cursor formatted my previous comment, it omitted the "pre-existing design tension" phrase it showed me in the original response; I'm dumping it here raw:
Bottom line: The hardcoded reference_internal in the fallback-with-parent case was actually correct. By the time the policy reaches copyable_holder_caster::cast(), it has already been
stomped to move by return_value_policy_override. Passing it through destroys the reference_internal semantics that def_readwrite intended.
Dustin's surprise is justified -- def_readwrite does specify reference_internal, but the override silently changes it. This is a pre-existing design tension: return_value_policy_override
is designed to force move for by-value returns of registered types (which makes sense for normal function returns), but it's at odds with the smart_holder def_readwrite getter which
returns an aliasing shared_ptr by value specifically as a mechanism to carry a reference.
So b299f32 should be reverted (go back to hardcoding reference_internal in the fallback). Your original [T002] concern about silently overriding copy is moot because the policy arriving
here is always move (the override already stomped whatever was originally specified).
This reinforces what I was thinking for a long time, hand wavy: the existing return value policy manipulations are tricky. The conclusion is the same: hardcoding reference_internal is probably the best we can do.
This actually is checked at compile time: m.def("getSharedEnumAB", []() -> std::shared_ptr<EnumAB> {
return std::make_shared<EnumAB>();
});I did add a test for the "returning shared ptr when holder is unique_ptr" case though. |
| RuntimeError, | ||
| match="Unable to convert std::shared_ptr<T> to Python when the bound type does not use std::shared_ptr or py::smart_holder as its holder type", | ||
| ): | ||
| m.getSimpleStructAsShared() |
There was a problem hiding this comment.
Weird that this check sometimes doesn't occur on some compilers? Maybe we just omit this test.
There was a problem hiding this comment.
At first sight, it seems like -DPYBIND11_TEST_SMART_HOLDER=ON could be the culprit?
But hang on a sec, I'll post another comment. Maybe that one will make this moot (not sure).
There was a problem hiding this comment.
JIC you find it's not moot, and JIC it helps: PYBIND11_TEST_SMART_HOLDER (cmake) triggers PYBIND11_RUN_TESTING_WITH_SMART_HOLDER_AS_DEFAULT_BUT_NEVER_USE_IN_PRODUCTION_PLEASE (C++)
Description
Alternative fix for #6003, primarily made it because I'm not sure this is better and to check that all tests pass. See discussion over on that PR for background and analysis.
Suggested changelog entry:
Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes